AITopics | cmdp problem

algorithm, artificial intelligence, inequality, (14 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Achieving \tilde{O}(1/\epsilon) Sample Complexity for Constrained Markov Decision Process

Neural Information Processing SystemsMar-21-2026, 14:23:49 GMT

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are given finite resources and a MDP with unknown transition probabilities. At each stage, we take an action, collecting a reward and consuming some resources, all assumed to be unknown and need to be learned over time. In this work, we take the first step towards deriving optimal problem-dependent guarantees for the CMDP problems. We derive a logarithmic regret bound, which translates into a $O(\frac{1}{\Delta\cdot\epsilon}\cdot\log^2(1/\epsilon))$ sample complexity bound, with $\Delta$ being a problem-dependent parameter, yet independent of $\epsilon$. Our sample complexity bound improves upon the state-of-art $O(1/\epsilon^2)$ sample complexity for CMDP problems established in the previous literature, in terms of the dependency on $\epsilon$. To achieve this advance, we develop a new framework for analyzing CMDP problems. To be specific, our algorithm operates in the primal space and we resolve the primal LP for the CMDP problem at each period in an online manner, with \textit{adaptive} remaining resource capacities. The key elements of our algorithm are: i) a characterization of the instance hardness via LP basis, ii) an eliminating procedure that identifies one optimal basis of the primal LP, and; iii) a resolving procedure that is adaptive to the remaining resources and sticks to the characterized optimal basis.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Achieving O (1 /ε) Sample Complexity for Constrained Markov Decision Process

Neural Information Processing SystemsFeb-16-2026, 14:41:10 GMT

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

444d69470b24ded080183c907b711bbf-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 15:17:24 GMT

constraint, inequality, probability, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

444d69470b24ded080183c907b711bbf-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 15:17:21 GMT

algorithm, constraint violation, sample complexity, (10 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

2022DOPEsuppl

Archana Bura

Neural Information Processing SystemsFeb-7-2026, 07:56:13 GMT

algorithm, inequality, occupancy measure, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Add feedback

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Neural Information Processing SystemsDec-24-2025, 03:02:48 GMT

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

8f9f4eb32b9081a90f2a0b2627eb2a24-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:29:04 GMT

algorithm, denote, sample complexity, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)

Add feedback

444d69470b24ded080183c907b711bbf-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 12:22:30 GMT

constraint, inequality, probability, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Neural Information Processing SystemsAug-14-2025, 12:22:26 GMT

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available.

algorithm, constraint violation, sample complexity, (10 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Filters

Collaborating Authors

cmdp problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

2022DOPEsuppl

Achieving \tilde{O}(1/\epsilon) Sample Complexity for Constrained Markov Decision Process

Achieving O (1 /ε) Sample Complexity for Constrained Markov Decision Process

444d69470b24ded080183c907b711bbf-Supplemental-Conference.pdf

444d69470b24ded080183c907b711bbf-Paper-Conference.pdf

2022DOPEsuppl

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

8f9f4eb32b9081a90f2a0b2627eb2a24-Paper-Conference.pdf

444d69470b24ded080183c907b711bbf-Supplemental-Conference.pdf

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP